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Abstract 

A  target  recognition  capability  is  described 
which  performs:  color  detection,  target  type 
and  pose  hypothesis  generation,  and  target  ver¬ 
ification  by  3D  alignment  of  target  models  to 
range  and  optical  imagery.  The  term  {co regis¬ 
tration’  is  introduced  to  describe  target,  range 
and  optical  sensor  alignment.  The  following 
key  verification  components  are  described  and 
demonstrated:  target-model  feature  extraction, 
model-driven  edge  detection,  range,  optical  and 
target  coregistration,  and  coregistration  space 
matching.  As  a  precursor  to  future  incorpora¬ 
tion  of  terrain  data,  the  ability  to  match  terrain 
features  to  imagery  from  the  UGV  Demo  C  test 
site  is  demonstrated. 

1  Introduction 

Our  goal  is  the  development  of  new  Automatic  Target 
Recognition  (ATR)  algorithms  which  are  more  robust 
with  respect  to  scene  clutter,  target  occlusion  and  vari¬ 
ations  in  viewing  angle.  The  heart  of  our  approach  is  to 
fuse  range  and  optical  imagery,  color  or  IR,  using  global 
geometric  constraints.  Constraints  derive  from  known 
sensor,  target  and  scene  geometry.  This  may  be  thought 
of  as  model-based  sensor  fusion,  and  contrasts  with  more 
traditional  approaches  which  attempt  to  fuse  data  based 
upon  low-level  queues  only  [EG92]. 

The  roots  of  our  approach  lie  in  past  alignment  based 
object  recognition  research  [Low85,  HU90,  BR95]  which 
has  demonstrated  the  value  of  algorithms  which  precisely 
vary  3D  object  to  sensor  alignment  as  part  recognition. 
While  this  paradigm  is  dominant  in  many  domains,  it 
is  surprisingly  absent  from  work  on  ATR.  Instead,  ATR 
is  dominated  by  systems  which  employ  fixed  sets  of  im¬ 
age  space  templates  or  probe  sets:  each  derived  from  a 

This  work  was  sponsored  by  the  Advanced  Research 
Projects  Agency  (ARPA)  under  grants  DAAH04-93-G-422 
and  DAAH04-95-1-0447,  monitored  by  the  U.  S.  Army  Re¬ 
search  Office. 
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slightly  different  viewpoint.  We  believe  that  integrating 
3D  alignment  into  the  recognition  is  superior  to  using 
large  sets  of  image-space  templates,  and  we  are  there¬ 
fore  developing  new  algorithms  to  accomplish  this  in  the 
context  of  ATR. 

Geometrically  precise  alignment  techniques  are  com¬ 
putationally  expensive.  To  limit  its  use,  up-stream  pro¬ 
cessing  focuses  attention  so  alignment  is  applied  spar¬ 
ingly  as  a  final  means  of  resolving  conflicting  hypothe¬ 
ses.  Consequently,  there  are  two  other  significant  efforts 
associated  with  this  project.  The  first  is  a  target  detec¬ 
tion  effort  being  led  by  the  University  of  Massachusetts. 
The  second  is  a  hypothesis  generation  effort  being  led  by 
Alliant  Techsystems. 

Recently  we  have  also  begun  looking  at  ways  to  add 
terrain  model  constraints  into  the  recognition  process.  A 
first  step  has  been  taken  by  adapting  existing  matching 
capabilities  to  a  restricted  but  quite  important  problem. 
To  date,  precise  determination  of  the  pointing  angle  of 
the  Unmanned  Ground  Vehicle  (UGV)  relative  to  the 
terrain  map  has  been  determined  by  hand.  At  the  end 
of  this  report  there  is  a  brief  overview  of  some  recent 
experiments  suggesting  a  practical  way  of  automating 
this  process. 

2  Alignment  and  ATR 

While  adapting  the  alignment  paradigm  to  ATR  might 
at  first  seem  to  be  a  simple  transfer  from  on  applica¬ 
tion  domain  to  another,  it  is  not.  By  their  nature,  ATR 
problems  are  more  difficult  than  those  typically  solved 
using  alignment  algorithms.  In  ATR,  image  resolution 
is  typically  low.  Targets  viewed  in  color  imagery  are 
textured,  in  FLIR  appearance  is  highly  variable,  and  in 
range  imagery  geometric  form  is  often  complex.  CAD 
models  of  targets  are  typically  available,  but  often  con¬ 
tain  excessive  detail.  Terrain  features  introduce  clutter 
and  targets  are  often  partially  occluded.  These  factors, 
plus  the  fundamental  ambiguities  associated  with  per¬ 
spective  mapping  of  small  objects  into  optical  imagery, 
make  direct  application  of  current  alignment  algorithms 
infeasible. 

To  counter  some  of  these  difficulties,  optical  imagery 
may  be  supplemented  with  range  imagery.  Direct  3D 
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‘data  resolves  many  ambiguities  inherent  in  optical  im¬ 
agery.  However,  the  introduction  of  a  second  sensor 
complicates  the  problem  by  introducing  the  need  to  fuse 
data  from  heterogeneous  sensors.  Fusion  might  be  ac¬ 
complished  by  processing  range  and  optical  imagery  sep¬ 
arately  and  combining  evidence  after  the  fact.  However, 
doing  so  throws  away  the  possibility  of  coupling  evidence 
through  the  known  3D  sensor  and  target  geometry. 

To  exploit  constraints  derived  from  scene  and  target 
geometry,  we  are  developing  new  algorithms  which  geo¬ 
metrically  align  3D  target  models  with  both  range  and 
optical  image  data.  This  alignment  is  performed  so  as 
to  maintain  global  geometric  constraints  associated  with 
known  sensor  and  scene  geometry.  Some  geometric  con¬ 
straints  are  precisely  calibrated  while  others  are  not  com¬ 
pletely  specified.  For  example,  the  3D  position  and  ori¬ 
entation  of  the  target  relative  to  the  sensors  obviously 
varies.  Also,  as  long  as  separately  mounted  range  and 
optical  sensors  are  used,  exact  pixel  registration  between 
images  can  be  expected  to  vary.  Thus,  estimates  of  3D 
object  pose  as  well  as  image  registration  should  be  re¬ 
fined  as  part  of  the  alignment  process.  As  a  shorthand, 
we  have  coined  the  term  coregistration  to  describe  this 
process  of  simultaneously  refining  these  estimates  based 
upon  corresponding  target  and  sensor  features. 

3  Components 

To  develop  a  complete  end-to-end  Target  Recognition 
capability,  recognition  is  divided  into  the  three  stages 
outlined  below: 

Color  Target  Detection  Given  color  imagery,  or  im¬ 
agery  from  any  multi-band  sensor,  determine  re¬ 
gions  of  interest  (ROIs)  where  targets  might  be 
present.  While  initiating  ATR  with  a  detection 
phase  is  standard,  two  things  are  novel  about  the 
approach  taken  here.  First,  new  machine  learning- 
technology  for  building  multi-variate  decision  trees 
is  being  adapted  to  the  problem  of  target  detection. 
Second,  color  imagery  is  being  utilized.  Results  to 
date  often  show  camouflaged  vehicles  can  be  distin¬ 
guished  from  natural  backgrounds  even  when  each 
is  in  a  gross  sense  the  same  color:  green  camou¬ 
flage  against  green  grass  and  brush.  Color  detection 
is  relatively  mature  and  has  been  integrated  and 
demonstrated  running  on  the  Unmanned  Ground 
Vehicle.  The  color  detection  effort  is  led  by  the 
University  of  Massachusetts.  The  general  approach 
to  target  detection  as  laid  out  in  [BDHR94]  is  based 
upon  more  general  work  on  the  use  of  learned  multi¬ 
variate  decision  trees  in  computer  vision  [DBU94]. 

Hypothesis  Generation  Given  regions  of  interest 
generated  by  the  color  detection  process,  or  any 
other  detection  algorithm,  this  second  stage  hy¬ 
pothesizes  what  type  or  types  of  vehicles  may  be 
present  and  at  what  positions  and  orientations  rel¬ 
ative  to  the  sensors.  To  provide  this  capability,  a 


LADAR  boundary  template  probing  algorithm  is 
being  utilized  which  is  itself  an  accomplished  ATR 
algorithm  [BJLP92].  This  algorithm  does  the  best 
it  can  to  reduce  the  possibilities  and  then  its  top 
hypotheses  are  passed  onto  the  final  coregistration 
verification  stage.  Adding  this  third  stage  takes  the 
pressure  of  making  the  final  decision  regarding  tar¬ 
get  type  off  the  boundary  probing  algorithm  and 
means  the  algorithm  operates  under  different  perfor¬ 
mance  constraints:  it  needs  to  generate  a  hypothesis 
which  is  approximately  correct,  but  need  not  rank 
the  top  alternatives  perfectly.  Alliant  Techsvstems 
is  leading  the  work  on  this  algorithm. 

Coregistration  Target  Verification  This  is  the  most 
intricate,  computationally  demanding,  and  novel  as¬ 
pect  of  our  system.  It  takes  as  input  the  target  ID 
and  pose  hypotheses  generated  by  the  previous  step, 
and  refines  each  based  upon  coregistration  of  the 
target  model,  optical  imagery,  and  range  imagery. 
The  output  is  an  exact  match  between  sensor  fea¬ 
tures  and  3D  target  features  and  a  quality  of  match 
measure  based  upon  the  associated  3D  alignment  of 
features.  First  instantiations  of  all  the  subcompo¬ 
nents  needed  to  perform  coregistration  verification 
have  been  developed  and  integrated  into  a  single 
testbed.  Each  of  these  subcomponents  represents  a 
separate  research  project  is  described  further  below. 

To  accomplish  target  verification  through  coregistra¬ 
tion,  a  set  of  component  technologies  have  been  devel¬ 
oped.  These  are  summarized  below: 

Target  Model  Feature  Prediction  CAD  models  of 
targets  do  not  explicitly  represent  the  types  of  in¬ 
formation  required  to  do  matching  in  optical  and 
range  imagery.  An  on-line  algorithm  for  generating 
sampled-surfaces  for  matching  to  range  data  and  3D 
silhouette  features  suitable  for  matching  to  optical 
imagery  has  been  developed. 

Model-Driven  Image  Feature  Extraction  In  opti¬ 
cal  imagery,  bottom-up  feature  extraction  is  prob¬ 
lematic  at  best.  Consequently,  a  model-driven  edge 
detection  and  line  extraction  process  has  been  im¬ 
plemented  which  seeks  locally  optimal  silhouette 
features  in  the  optical  imagery. 

Range,  Optical  and  Target  Coregistration 

Extending  past  work  on  3D  pose  determina¬ 
tion  [KH94],  a  new  least-squares  algorithm  has 
been  developed  which  determines  the  3D  pose 
of  the  target  relative  to  a  range  and  an  optical 
sensor  and  simultaneously  adjusts  the  registration 
mapping  between  the  sensor  image  planes  [SB94]. 
A  recent  extension  of  this  work  to  perform  median 
filtering  has  dramatically  improved  the  quality  of 
the  results  [Ant96b,  Ant96a]. 
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Coregistration  Space  Matching  A  match  quality 
measure  has  been  formalized  to  evaluate  alternative 
coregistration  estimates  based  upon  fidelity  of  the 
target  model  to  the  sensor  data.  A  local  search  oper¬ 
ator  in  the  space  of  coregistration  mappings  is  then 
used  to  find  better  matches  between  target  models 
and  sensor  features.  This  work  is  further  described 
below,  in  [BSS96]  and  elsewhere  in  these  proceed¬ 
ings  [Mar96a]. 

Each  of  the  components  outlined  above  is  a  focus  of 
research  and  the  progress  in  each  area  is  summarized  in 
the  following  sections. 

4  Progress  on  Key  Components 

4.1  Color  Detection 

The  essential  elements  of  this  work  along  with  results  on 
data  collected  by  ourselves  and  Martin  Marietta  at  Fort 
Carson  [BPY94a]  were  reported  in  the  previous  Image 
Understanding  Workshop  Proceedings  [BDHR94].  Since 
this  initial  description  of  the  color  detection  work,  the 
following  has  been  accomplished: 

1.  An  improved  way  of  coalescing  individual  pixel  de¬ 
tections  into  ROIs  has  been  implemented. 

2.  The  algorithm  has  been  successfully  integrated  with 
the  other  Reconnaissance,  Surveillance  and  Target 
Acquisition  software  running  on  the  UGV  and  has 
been  demonstrated  to  run  reliably  in  the  field. 

3.  The  algorithm  is  being  formally  evaluated  in  an  in¬ 
dependent  effort  being  led  by  Ted  Yachik  of  LG  A. 

4.1.1  Operating  Scenario 

It  is  best  to  start  by  reviewing  the  basic  operating  sce¬ 
nario  for  the  color  detection  system.  First,  it  is  assumed 
that  training  imagery  is  obtained  prior  to  a  fielded  mis¬ 
sion,  and  based  upon  this  training  data  the  system  learns 
to  discriminate  between  color  values  produced  by  camou¬ 
flaged  vehicles  and  values  produced  by  background  ter¬ 
rain.  The  result  of  this  training  is  a  color  lookup  ta¬ 
ble  (LUT)  indicating,  for  each  possible  RGB  color  pixel 
value,  whether  it  is  more  likely  to  be  produced  by  a  tar¬ 
get  or  background. 

In  fielded  operation,  the  system  performs  real-time 
color  lookup  on  all  pixels  coming  in  and  classifies  them  as 
target  or  background.  Then,  an  ROI  extraction  process 
sums  responses  over  fixed  sized  windows  in  the  image 
and  extracts  ROIs:  one  ROI  for  each  local  maximum  in 
this  summed  response  image  which  is  over  a  minimum 
threshold.  When  integrated  with  the  RSTA  package  on 
the  UGV,  the  results  of  the  color  detection  were  com¬ 
bined  with  those  of  a  traditional  FLIR  detection  algo¬ 
rithm. 

IP  and  color  detection  complement  each  other,  since 
false  positives  do  not  tend  to  correlate.  For  IR,  false 
positives  are  typically  produced  by  objects  such  as  rocks, 


Min 

Max 

Median 

Mean 

S.D. 

3 

41 

13 

15 

9.5 

Table  1:  Detection  Statistics  on  51  Demo  C  Test  Images. 
No  true  target  was  missed  in  this  test. 

which  heat  up  in  the  sun,  or  reflect  solar  energy  back  into 
the  IR  sensor  1 .  For  color,  typically  cool  objects  such  a 
shrubs  and  trees  tend  to  generate  false  positives. 

Perhaps  the  most  important  factor  in  evaluating  the 
usefulness  of  color  detection  concerns  the  degree  to  which 
training  generalizes  to  variations  in  field  conditions.  The 
current  system,  using  a  single  LUT,  has  been  demon¬ 
strated  to  generalize  across  times  of  day,  lighting  condi¬ 
tions,  weather  and  vehicles.  Results  demonstrating  this 
on  color  imagery  obtained  from  35mm  film  have  been 
previously  reported  [BDHR94,  BHP95],  and  more  recent 
results  obtained  using  the  color  CCD  sensor  on  the  UGV 
are  summarized  below. 

The  current  system  does  not  generalize  across  sensors, 
but  instead  is  trained  to  the  specific  response  of  a  par¬ 
ticular  sensor.  This  is  primarily  a  matter  of  experience, 
since  generalization  between  sensors  requires  effort  be 
devoted  to  the  problem  of  cross-sensor  color  calibration. 
In  principle  such  calibration  can  be  done,  but  there  are  a 
variety  of  subtle  issues  involved  which  make  this  its  own 
topic  for  future  research. 

4.1.2  Experience  Running  on  SSV-B 

One  way  to  illustrate  our  confidence  in  the  potential 
of  the  color  detection  system  is  to  simply  recount  our 
first  experiences  with  the  system  running  as  part  of  the 
UGV  RSTA  package.  After  a  significant  software  inte¬ 
gration,  the  system  was  finally  integrated  and  debugged 
by  mid  June,  1995.  On  Tuesday,  June  13,  18  training  im¬ 
ages  were  collected  using  SSV-B.  On  Wednesday  morn¬ 
ing,  June  14,  an  hour  was  taken  to  select  14  image  chips: 
3  indicating  typical  background  colors  and  11  showing 
vehicles.  A  color  LUT  was  built,  loaded  on  the  vehicle, 
and  the  system  was  tested  from  1  to  5  PM  on  51  new7 
images.  These  51  images  included  targets  not  in  the 
training  data:  both  with  brown  and  green  camouflage 
and  viewed  from  vantage  points  different  from  those  in 
the  training  data. 

The  key  result  wras  that  over  the  4  hour  period,  under 
both  cloudy  and  sunny  conditions,  viewing  4  different 
targets  from  2  different  vantage  points,  the  system  never 
missed  a  target .  This  first  field  result  was  positive  be¬ 
yond  our  expectations.  While  perfect  performance  such 
as  this  is  not  a  realistic  expectation  in  general,  it  is  sug¬ 
gestive  of  the  strength  of  the  system.  Tight  timing  con¬ 
straints  associated  with  scheduling  of  SSV-B  leading  up 
to  Demo  C  prevented  further  field  testing. 

Because  the  system  wras  tuned  to  work  with  FLIR, 

1The  RSTA  FLIR  operates  in  the  3  to  5  micron  band  and 
is  sensitive  to  reflected  thermal  energy. 


3 


Figure  1:  Color  Detection  Example  in  UGV  Data  from 
Summed  detection  values  from  which  ROIs  are  derived. 

a  high  false  positive  rate  was  considered  acceptable  as 
a  way  of  reducing  the  change  of  missed  targets.  Ta¬ 
ble  1  provides  statistics  summarizing  the  detection  per¬ 
formance  over  the  51  test  images.  The  columns  present 
the  minimum  number  of  detections  on  a  single  image, 
maximum  number  of  detections  on  a  single  image,  the 
median  number  across  the  image  set,  the  average  num¬ 
ber,  and  the  standard  deviation. 

To  illustrate  how  these  detection  ROIs  appear,  the 
ROIs  found  for  a  typical  image  from  the  June  tests  at 
the  Demo  C  site  are  shown  in  Figure  la  2.  The  summed 
response  producing  these  ROIs  are  shown  in  Figure  lb. 
Because  each  ROI  is  relatively  small,  even  for  those  im¬ 
ages  with  high  numbers  of  detections,  the  color  detection 
algorithm  is  focusing  attention  on  a  very  small  percent¬ 
age  of  the  total  image. 

4.2  Hypothesis  Generation 

The  LADAR  boundary  probing  system  developed  by 
Alliant  Techsystems  has  been  modified  to  run  as  a  stand 
alone  system  on  a  Sparc  workstation.  James  Steinborn 
at  Colorado  State,  and  Kris  Siejko  at  Alliant  Techsys¬ 
tems,  have  been  working  to  make  the  system  operational 
in  this  stand-alone  mode  running  under  Solaris.  They 
have  also  been  making  enhancements,  including  a  new 
visualization  tool  developed  by  Jim  Steinborn  which  en¬ 
ables  us  to  better  understand  the  geometric  structure  of 
the  probe  templates. 

Preliminary  tests  of  system  have  been  performed  on 
LADAR  images  from  the  Fort  Carson  data  set  [BPY94b]. 
Templates  for  the  Ml  13  and  M60  generated  at  52  meters 
were  selected  for  all  the  tests.  Templates  generated  at 
closer  and  farther  ranges  were  tried  as  well,  but  the  re¬ 
sults  changed  little.  The  probing  algorithm  does  its  own 
internal  template  scaling  based  upon  the  LADAR  data 

2 Many  of  the  figures  in  this  paper  look  much  better 
in  color  and  can  be  accessed  through  the  CSU  homepage: 
http :  :  //www.  cs  .  colostate  .  edu/ ^vision/ 
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itself.  There  were  a  total  of  72  templates:  36  of  each 
vehicle  sampled  at  10  degree  aspect  increments. 

The  results  for  this  test  are  summarized  in  Table  2  3. 
The  hypothesis  generation  algorithm  produces  a  set  of 
ranked  hypotheses.  The  top  five  are  indicated  as  pairs: 
(vehicle) /(aspect  angle).  The  best  hypothesis,  in  terms 
of  the  true  type  and  aspect  of  the  vehicle,  are  indicated 
in  boldface  in  Table  2. 

The  correct  hypotheses  are  not  the  highest  ranked  hy¬ 
potheses  based  upon  the  boundary  probe  result  alone. 
However,  a  hypothesis  for  the  correct  target  within  10 
degrees  of  the  true  aspect  angle  does  appear  in  the  top 
five  for  all  4  images.  While  the  performance  of  the 
boundary  probing  could  no  doubt  be  improved  through 
more  careful  tuning  to  the  Fort  Carson  data,  this  ex¬ 
periment  demonstrates  the  system  is  capable  of  focusing 
attention  upon  a  small  set  of  aspect  and  target  hypothe¬ 
ses. 

4.3  Model  Feature  Prediction 

Highly  detailed  Constructive  Solid  Geometry  (CSG) 
models  of  target  vehicles  are  available  in  BRL-CAD  for¬ 
mat  [U.  91] .  These  detailed  models  are  a  tremendous  as¬ 
set.  However,  much  work  is  required  to  transform  these 
CSG  models  into  features  appropriate  for  matching  to 
sensor  data.  Over  the  past  year,  Mark  Stevens  has  de¬ 
veloped  a  semi-automated  system  for  transforming  the 
CSG  to  a  polygonal  representation  [SBG95,  Ste95].  He 
has  also  developed  a  fully  automated  system  for  extract¬ 
ing  edge  and  surface  information  from  these  polygonal 
models.  This  later  system  is  summarized  here  and  more 
fully  described  elsewhere  in  these  proceedings  [Mar96a]. 

For  optical  imagery,  3D  face  boundaries  likely  to  gen¬ 
erate  observable  edges  in  imagery  are  extracted  from  the 
3D  target  models.  This  is  done  on-line  given  an  estimate 
of  the  target  pose  and  lighting.  Target  pose  is  produced 

3The  LADAR  and  optical  imagery  for  image  4  appears  in 
Figure  3. 
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No. 

File 

Annotation 

Hyp  1 

Hyp  2 

Hyp  3 

Hyp  4 

Hyp  5  ' 

i 

nov21115Ll 

Clear  Angle  On 

M113/10 

M113/40 

M113/120 

M113/30 

M113/50 

2 

nov40755Ll 

Nose  Down  Profile 

Ml  13/120 

M113/110 

Ml  13/60 

M113/50 

M113/70 

3 

nov31055Ll 

Clear  Profile 

M113/120 

M113/60 

M~1 13/50 

M113/70 

M113/80 

4 

nov31170Ll 

Head  On  Nose  Down 

M113/30 

Ml  13/20 

Ml  13/40 

M60/0 

M113/10 

Tabk  2:  Hypothesis  Generation  Results  for  4  Fort  Carson  Images.  The  target  in  all  cases  is  an  M113.  The  top 
five  hypotheses  are  shown  in  ranked  as  (vehicle) /(aspect  angle).  The  hypothesis  indicated  in  boldface  is  within  10 
degrees  of  the  true  aspect  angle. 


by  the  hypothesis  generation  phase  described  above  and 
is  refined  as  part  of  matching  as  discussed  below  in  Sec¬ 
tion  4.6.  Lighting  information  for  outdoor  scenes  may 
be  derived  from  co-lateral  information  regarding  time- 
of-day,  time-of-year,  place  and  weather. 

For  range  imagery,  sampled  surfaces  are  extracted 
from  the  3D  model  using  a  process  which  simulates  the 
operation  of  the  actual  range  sensor.  The  target  model 
is  transformed  into  the  range  sensor’s  coordinate  system 
using  the  initial  estimate  of  the  target’s  pose  and  rays 
are  cast  into  the  scene  and  intersected  with  the  3D  faces 
of  the  target  model.  Sampling  geometry  is  selected  to 
reflect  the  characteristics  of  the  actual  range  device. 

4.3.1  Silhouettes  and  Internal  Structure 

To  match  3D  rather  than  2D  image  space,  features 
must  be  3D  and  not  flattened  2D  templates.  Therefore, 
the  model  feature  prediction  system  determines  those  3D 
features  within  the  target  model  responsible  for  generat¬ 
ing  the  target  silhouette  for  a  given  pose  estimate.  These 
3D  features  naturally  accommodate  modest  changes  in 
viewing  angle.  If  the  expected  pose  estimate  changes 
significantly,  new  features  may  be  generated.  The  first 
version  of  the  feature  prediction  system  extracts  only 
features  associated  with  the  silhouette.  More  recent 
work,  described  in  more  detail  elsewhere  in  these  pro¬ 
ceedings  [Mar96a],  extends  this  method  to  add  signifi¬ 
cant  internal  structure  as  well. 

The  silhouette  prediction  algorithm  begins  by  assign¬ 
ing  a  unique  color  to  each  face  in  the  target  model.  This 
color  acts  as  an  index  into  a  hash  table  of  3D  faces.  The 
model  is  then  rendered  from  the  hypothesized  viewing 
angle  using  orthographic  projection  and  a  hardware  Z- 
buffer.  A  target  model  with  250  faces  can  be  rendered 
in  1.2  seconds  on  a  Sparc  10  with  a  ZX  hardware  accel¬ 
erator  . 

Pixels  adjacent  to  the  unique  background  color  indi¬ 
cate  faces  contributing  to  the  target  silhouette.  These 
faces  are  in  turn  checked  to  determine  which  face  bound¬ 
aries  (edges)  contribute  to  the  silhouette.  These  edges 
are  then  clipped  to  retain  only  those  segments  lying  on 
the  silhouette.  Since  rendering  is  orthograpdiic,  para¬ 
metric  values  for  clipping  measured  in  the  image  space 
may  be  applied  directly  to  the  corresponding  3D  edges. 
The  final  result  is  a  list  of  3D  edges  representing  the 
silhouette  of  the  target  model  for  a  given  viewing  angle. 


4.4  Model-Driven  Image  Feature  Extraction 

Bottom-up  line  extraction,  such  as  performed  by  the 
Burns  algorithm  [BHR86],  is  unreliable  in  imagery  such 
as  that  shown  in  Figure  2b.  To  overcome  the  difficulty  in¬ 
herent  in  this  imagery,  a  more  model-driven  approach  is 
required.  To  accomplish  this,  we  combine  two  ideas  from 
the  literature:  model-driven  edge  detection  [FL87,  FL88] 
and  directionally  tuned  gradient  filters  [Can86].  The 
quality  of  a  straight  line  segment  denoting  an  extended 
edge  is  defined  to  be  a  function  of  the  gradient  magnitude 
under  that  edge.  A  gradient  mask  tuned  to  the  specific 
expected  orientation  of  the  segment  is  used.  The  place¬ 
ment  of  the  segment  is  perturbed  until  a  locally  optimal 
placement  is  found. 

4.4.1  Placing  Silhouette  Edges 

Initially,  3D  silhouette  edges  are  projected  into  the 
color  image  based  upon  the  known  intrinsic  sensor  pa¬ 
rameters  and  the  estimated  pose  of  the  target.  Figure  2b 
shows  the  projection  of  a  3D  silhouette  onto  the  image 
plane  using  the  sensor  calibration  parameters  [BHP94]. 
The  pose  estimate  comes  from  the  hypothesis  generation 
stage. 

For  each  projected  silhouette  feature,  a  search  is  ini¬ 
tiated  in  the  image  for  the  locally  best  corresponding 
line  segment.  A  gradient  mask  tuned  to  the  the  par¬ 
ticular  expected  orientation  of  each  silhouette  edge  is 
created  by  rotating  the  first  derivative  of  a  Gaussian. 
There  are  many  precedents  for  tuned  edge  masks  includ¬ 
ing  Canny  [Can86]  and  Torres  [TP86]  and  their  use  for 
bottom-up  edge  detection  [Shu94,  FA91].  An  example 
of  such  a  filter,  displayed  as  an  image,  is  shown  in  Fig¬ 
ure  2a. 

The  placement  of  the  silhouette  edge  is  locally  per¬ 
turbed  so  as  to  maximize  a  function  of  the  underlying 
tuned  gradient  response.  To  do  this  with  subpixel  accu¬ 
racy,  a  commonly  used  graphics  anti-aliasing  technique 
known  as  Pineda  Arithmetic  [Pin88]  is  used  to  weight 
the  contribution  of  individual  pixels.  A  weighting  value 
for  each  pixel  is  created  (see  Figure  2d)  and  the  response 
for  the  silhouette  line  is  the  weighted  sum  of  responses 
at  each  pixel.  Additional  details  on  this  work  appear 
elsewhere  in  these  proceedings  [Mar96b]. 
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a.  Mask 


b.  Silhouette  Line  c.  Gradient  Response  d.  Weight 
Figure  2:  Gradient  Mask  and  Response 


4.5  Range,  Optical  and  Target  Coregistration 

Anthony  Schwickerath  has  developed  a  new  least- 
median-squares  algorithm  for  determining  the  best 
coregistration  estimate  based  upon  a  set  of  correspond¬ 
ing  sensor  and  target  features.  The  motivation  for  this 
work  and  full  mathematical  development  has  been  pre¬ 
sented  previously  [SB94,  BHP95].  Recently,  median  fil¬ 
tering  has  been  included  in  the  algorithm.  In  addition,  a 
match  error  for  ranking  alternative  correspondence  map¬ 
pings  [Ant 96b]  has  been  developed. 

The  complete  system  has  now  been  demonstrated  on 
optical  and  range  imagery  collected  at  Fort  Carson  us¬ 
ing  high  quality  range  and  optical  target  model  features 
generated  using  the  system  described  above.  Results 
of  these  improvements  are  reported  in  more  detail  else¬ 
where  in  these  proceedings  [Ant96a].  Here,  allow  us  to 
summarize  briefly  what  coregistration  does,  and  show 
one  result  using  the  new  median  filtering  capability. 

4.5.1  Review  and  Median  Filtering  Example 

Coregistration  addresses  a  fundamental  problem  aris¬ 
ing  when  optical  and  range  imagery  is  collected  from  sep¬ 
arate  co-located  sensors.  Off-line  calibration  can  largely 
determine  the  sensor-to-sensor  registration,  but  some 
small  variations  may  be  expected  during  field  operations. 
Thus,  multi-sensor  pose  determination  must  include  de¬ 
grees  of  freedom  to  express  movement  of  the  target  rela¬ 
tive  to  the  sensors,  as  well  as  degrees  of  freedom  to  per¬ 
mit  fine  adjustments  to  the  image- to-im age  registration 
between  the  sensors. 

Our  specific  formulation  of  this  problem  introduces  a 
coplanarity  constraint  which  limits  the  freedom  of  move¬ 
ment  of  the  range  sensor  relative  to  the  optical  sensor. 
Thus  the  range  reference  coordinate  system  may  trans¬ 
late  in  the  common  x-y  image  plane  of  the  two  sensors, 
but  otherwise  the  two  sensors  move  together.  Thus,  six 
degrees  of  freedom  express  the  position  and  orientation 
of  the  target  relative  to  the  sensor  suite  and  two  degrees 
of  freedom  permit  translation  of  the  optical  image  plane 
with  respect  to  the  range  image  plane. 

This  choice  of  parameterization  may  at  first  seem  odd. 
Given  two  sensors  on  a  common  platform,  it  is  their  rel¬ 
ative  pointing  angles,  not  their  relative  spacing,  which 
is  most  likely  to  vary.  However,  the  pixel-to-pixel  move¬ 
ment  between  the  two  image  planes  is  virtually  indistin¬ 


guishable  in  the  two  cases  when  rotations  are  small.  The 
advantage  of  this  translation  formulation  is  that  it  does 
not  introduce  a  second  rotation  term  into  the  coregistra¬ 
tion  formulation,  which  would  in  turn  add  unnecessary 
nonlinearities. 

Figure  3  4  shows  a  result  of  the  median  filtering  coreg¬ 
istration  algorithm  on  a  pair  of  range  and  optical  images 
containing  an  Ml  13.  This  is  the  same  data  for  which 
results  of  the  hypothesis  generation  algorithm  were  re¬ 
ported  in  Table  2:  image  NOV31170L1  5.  The  top  im¬ 
age  shows  silhouette  model  features  overlaid  on  the  op¬ 
tical  image.  The  body  of  the  figure  shows  the  coregis¬ 
tered  range  features  from  nine  different  vantage  points 
in  order  to  provide  alternative  views  of  the  3D  structure. 
The  filled  rectangles  are  features  included  in  the  least- 
median-squares  match:  light  representing  image  points 
and  dark  representing  model  points.  The  outlined  rect¬ 
angles  are  features  excluded  from  the  match.  The  coreg¬ 
istration  estimate  shown  in  Figure  3  has  correctly  skewed 
the  target  aspect  slightly  to  the  left,  even  though  the  ini¬ 
tial  estimate  was  exactly  head-on. 

4.5.2  Correspondence  Space  Matching 

The  space  of  possibly  corresponding  target  and  sen¬ 
sor  features  is  inherently  combinatoric  and  the  abso¬ 
lute  number  of  possible  correspondences  is  highly  de¬ 
pendent  upon  the  accuracy  of  the  initial  coregistration 
estimate.  To  determine  candidate  features,  range  and 
optical  model  feature  are  projected  into  the  imagery  and 
all  image  features  within  some  local  area  are  marked  as 
potential  matches.  The  size  of  these  areas  grows  with 
uncertainty  in  the  coregistration  estimate.  For  median 
filtering  to  be  effective,  the  initial  estimate  must  be  accu¬ 
rate  enough  to  ensure  that  over  50%  of  all  paired  features 
are  part  of  the  true  match. 

In  principle,  search  in  this  combinatorial  space  of  cor¬ 
respondence  mappings  could  be  conducted  using  local 
search  in  a  manner  analogous  to  that  set  for  in  Bev¬ 
eridge’s  dissertation  [Bev93].  However,  as  a  practical 
matter,  the  fine  grained  sampling  of  range  points  causes 
the  combinatorics  to  explode.  While  such  a  local  search 

4  This  figure  is  being  improved  so  grey  tones  work  out 
properly 

5The  unusual  angle  of  the  vehicle  makes  this  an  interesting 
and  challenging  problem. 
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Figure  3:  Coregistration  median  filtering  results  for  ml  13  in  a  pair  LADAR  and  Color  Images.  The  top  image  shows 
silhouette  features  in  color  image  using  the  coregistration  estimate.  The  bottom  nine  views  show  3D  registration  of 
target  and  range  features  displayed  from  different  viewpoints. 

procedure  has  been  implemented,  it’s  usefulness  is  cur-  bounded  faces  are  already  represented.  For  the  range 
rently  limited,  and  will  only  become  practical  if  some  data,  we  are  experimenting  with  a  scan-line  range  seg- 
form  of  initial  grouping  is  applied  to  the  range  features.  mentation  technique  [X.  94]. 

This  is  trivial  in  the  case  of  the  target  models,  where 


a.  Initial  Orientation 


b.  Initial  Color  Pose 


c.  Initial  LADAR  Pose 


d.  Resulting  Orientation 


e.  Resulting  Color  Pose 


f.  Resulting  LADAR  Pose 


Figure  4:  Coregistration-space  Local  Search  Results  for  Shot  20  Array  5  (Ml  13  APC) 


4.6  Coregistration  Space  Matching 

As  an  alternative  to  search  in  the  space  of  correspon¬ 
dence  mappings  between  target  model  and  sensor  fea¬ 
tures,  this  section  introduces  a  local  search  algorithm 
which  operates  in  the  space  of  coregistration  estimates. 
While  increasing  the  number  of  potentially  matching  fea¬ 
tures  increases  the  size  of  the  correspondence  space  expo¬ 
nentially,  the  dimensionality  of  the  coregistration  space 
is  fixed:  K8  for  the  case  treated  here.  This  work  is  be¬ 
ing  pursued  as  an  extension  of  Mark  Stevens’s  work  on 
model  prediction  and  model-driven  feature  extraction. 
A  more  detailed  account  of  this  work  appears  elsewhere 
in  these  proceedings  [Mar96a].  What,  is  presented  here 
is  an  overview  with  an  example. 

Early  experiments  with  search  in  coregistration-space 
suggests  an  ability  to  correct  quite  large  errors  in  the 
initial  coregistration  parameters.  While  this  approach  is 
still  very  early  in  its  development  and  has  been  applied  to 
only  four  pairs  of  range  and  optical  imagery,  this  initial 
experience  is  quite  encouraging. 

If  any  failing  has  been  observed  so  far,  it  is  a  weak¬ 
ness  in  generating  the  final  highly  precise  match.  In 
other  word,  search  leads  to  a  much  better  but  not  per¬ 
fect  coregistration  estimate.  This  suggests  the  two  ap¬ 
proaches  complement  each  other,  the  first  getting  to  a 
near  correct  estimate  followed  by  median  filtering  pro¬ 
ducing  a  highly  accurate  final  result.  These  two  al¬ 
gorithm  are  implemented  within  the  same  testbed  and 
combined  testing  will  begin  shortly. 

The  search  algorithm  locally  minimizes  an  error  func¬ 


tion  which  measures  the  relationship  between  the  model 
and  data  features.  This  measurement  takes  into  account 
both  range  and  optical  features,  but  treats  the  two  cases 
somewhat  differently.  For  the  optical  features,  the  error 
is  a  function  of  the  gradient  measurement  used  in  the 
model-driven  edge  detection  described  in  Section  4.4. 
For  range,  the  error  measure  is  a  function  of  the  Eu¬ 
clidean  distance  from  points  on  the  target  model  sam¬ 
pled  surface  to  their  nearest  neighbor  in  the  range  image 
data. 

The  local  search  itself  samples  each  of  the  8  dimen¬ 
sions  of  the  coregistration-space  about  the  current  esti¬ 
mate.  Clearly,  the  step-size  used  in  this  sampling  is  im¬ 
portant.  The  general  strategy  implemented  moves  from 
coarse  to  fine  sampling  as  the  algorithm  converges  upon  a 
locally  optimal  estimate.  The  initial  scaling  of  the  sam¬ 
pling  interval  is  determined  automatically  based  upon 
moment  analysis  applied  to  the  initial  target-model  and 
sensor  data  sets.  The  search  continually  takes  the  best 
along  each  dimension  until  it  reaches  an  optima.  When 
no  further  progress  is  possible  along  any  dimension,  the 
resulting  8  values  are  returned  as  the  locally  optimal 
coregistration  estimate. 

Figure  4  shows  a  sample  result  of  the  coregistration- 
space  search  applied  to  an  image  containing  the  mll3. 
The  left  column  shows  the  initial  coregistration  estimate, 
both  target  pose  and  sensor  registration,  provided  by 
the  hypothesis  generation  algorithm.  The  right  column 
shows  the  result  of  the  local  search  algorithm.  Figures  4a 
and  4d  show  how  the  process  has  corrected  the  orienta- 
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tion  of  the  target. 

Figures  4b  and  e  show  both  the  silhouette  and  internal 
features.  Observe  how  closely  the  features  in  Figures  4e 
correspond  to  the  color  image.  Also  note  that  change  in 
appearance  reflects  the  3D  rotation  of  the  target  model: 
the  change  between  Figures  4b  and  4  e  in  a  could  not  be 
produced  by  a  2D  affine  transformation  of  a  2D  template. 

Figures  4c  and  4f  show  the  movement  of  the  target 
model  sampled  points  relative  to  the  actual  range  data. 
It  is  this  range  data  which  is  providing  much  of  the  con¬ 
straint  used  to  correct  the  orientation  of  the  target.  The 
dark  grey  rectangles  are  model  points,  the  lighter  grey 
rectangles  are  range  image  points.  Because  the  search 
aligns  the  two,  it  is  difficult  to  distinguish  model  from 
data  in  Figure  4f,  while  they  are  easily  distinguished  in 
the  initial  configuration:  Figures  4c. 

5  Horizon  Line  Orientation  Correction 

There  are  both  immediate  and  longer  range  benefits  to 
be  realized  if  terrain-based  constraints  are  integrated 
into  the  ATR  process.  One  of  the  most  immediate  needs 
is  strikingly  evident  in  the  RSTA  function  of  the  UGV 
program.  The  current  SSVs  developed  by  Martin  Ma¬ 
rietta  use  GPS  to  determine  position  and  inertial  nav¬ 
igation  to  determine  orientation.  All  but  the  most  ex¬ 
pensive  inertial  systems  only  measure  true  orientation 
to  within,  speaking  loosely,  1  degree.  For  some  purposes 
this  is  a  modest  error,  but  wThen  attempting  to  register 
imagery  from  a  4  degree  field  of  view  sensor  with  stored 
terrain  maps,  such  an  error  introduces  a  128  pixel  error 
in  a  512x512  image. 

Consequently,  a  normal  part  of  current  SSV  opera¬ 
tion  is  something  called  HOC  (Horizon  Line  Orientation 
Correction)  or  LOC  (Landmark  Orientation  Correction). 
This  is  a  by-hand  procedure  in  which  known  points  on 
the  horizon  or  surveyed  points  on  the  terrain  map  are 
fed  to  the  RSTA  system  in  order  that  it  can  recover,  to 
within  several  pixels,  the  true  relationship  between  the 
stored  terrain  map  and  the  live  video  imagery.  Clearly, 
this  is  done  once  the  vehicle  is  stopped  at  an  observa¬ 
tion  point,  and  must  be  repeated  each  time  the  vehicle 
is  moved. 

While  the  need  for  such  precise  registration  may  at 
first  not  be  obvious,  it  is  a  pre-condition  for  the  use  of 
most  FLIR  target  detection  algorithms.  This  is  because 
these  algorithms  require  an  initial  range-to-target  esti¬ 
mate  for  every  pixel  in  the  FLIR  image.  In  the  absence 
of  an  active  ranging  sensor,  these  estimates  can  be  de¬ 
rived  from  terrain  maps,  but  only  if  precise  registration 
is  established. 

Recently,  Christopher  Graves  and  Christopher  Lesher 
have  begun  work  on  automating  the  terrain-to-imagery 
registration  task.  A  proof-of-concept  horizon  line  match¬ 
ing  experiment  has  been  conducted  using  imagery  col¬ 
lected  by  Lockheed-Martin  at  the  UGV  Demo  C  test  site. 
The  geometric  matching  system  originally  developed  as 
part  of  Beveridge’s  thesis  work  is  being  used  [Bev93j  in 


this  test  and  results  are  presented  elsewhere  in  these  pro¬ 
ceedings  [J.  96].  The  results  prove  the  feasibility  of  using 
local  search  matching  as  a  tool  to  automate  the  orienta¬ 
tion  correction  process  in  domains  where  horizon  struc¬ 
ture  is  distinctive. 
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